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Abstract. The present research work proposes a new fast fixed-point averaging algorithm on the 
compact Stiefel manifold based on a mixed retraction/lifting pair. Numerical comparisons between 
fixed-point algorithms based on the proposed non-associated retraction/lifting map pair and two 
associated retraction/lifting pairs confirm that the averaging algorithm based on a combination of 
mixed maps is remarkably less computationally demanding than the same averaging algorithm based 
on any of the constituent associated retraction/lifting pairs. 

Key words. Compact Stiefel manifold, Empirical averaging, Kolmogoroff-Nagumo mean, Man- 
ifold retraction/lifting maps. 



1. Introduction. The question about how to define the notion of mean value 
over a set of structured samples has no intrinsic answer. Even in the simple case of a 
data set made of two positive real-valued numbers, there exist a variety of methods 
to define their mean value, each of which possesses different properties and leads to a 
different numerical result. The best known averages of two numbers x, y > are the 
arithmetic mean ^(x + y), the harmonic mean and the geometric mean y/xy. A 
generalization of these is the Heinz mean ^(x a y 1 ~ a + y a x l ~ - a ), with < a < \, that 
interpolates between the arithmetic (a = 0) and the geometric (a = |) mean. Another 
averaging formula is given by the Heronian mean i (x + y/xy + y) , that corresponds 
to a weighted sum of the arithmetic and the geometric means of the positive numbers 
x and y. Most averaging formulas lead to instances of the Chisini mean as defined 
in [7]. The Chisini mean provides a good illustration of the fact that it is possible to 
define a mean value of a set of numbers without any particular requirement about, 
e.g., convexity. A function / of N real- valued variables leads to a Chisini mean value 
fi if, for every tuple (xi, X2, ■ • • , Xjsf), there exists a unique \i such that 

f(x 1 ,X 2 ,...,X N ) = f(fi,(J,,...,fl). (1.1) 

The definition of Chisini mean is really general and there were discussions whether it 
could be taken as a valid definition of mean (for a discussion, see, e.g., [H]). But note 

that, for example, the arithmetic mean fia^jj J2k=i x k of a tuple (xi, x%, ■ ■ ■ , xn) is 

a Chisini mean with f(x%, x%, . . . , xn) = x\ + x 2 + ■ ■ ■ + xjq- In fact, with this choice 
of the function /, the equation (|1.1|) reads: 

N N 

J> fc = ]T,i = A^. (1.2) 



fe=i fc=i 
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A rather general definition of mean value of a set of real- valued numbers led to the 
Kolmogoroff-Nagumo (or quasi- arithmetic) mean 




^2<P 1 ( x k) 



N 



(1.3) 



for a continuous strictly monotonic function ip (for a review, see |19j). In the case of 
structured samples, such as structured matrices, the question about how to extend 
the definition of Kolmogoroff-Nagumo mean is rather involved. The Kolmogoroff- 
Nagumo mean turns out to be a special case of the Chisini mean. In fact, upon 
defining f(x\, X2, ■ ■ ■ , x^r) =ip^ 1 (xi) + <fi~ 1 {x2) + • ■ • + (p~ l (x^), it is readily verified 
that the mean //kn is the unique solution to the Chisini equation (jl.ip 



Structured matrices appear in a variety of settings such as matrix-based optimiza- 
tion problems PQ, medical imaging [5], statistical analysis on manifolds [5], adaptive 
filtering [9] , computational ophthalmology |10j and radio polarimetry |13j . The empir- 
ical mean is perhaps the most useful statistical characterization of a set of structured 
matrices [H [5J [HI [13 [5D] , along with the empirical variance. When the matrices to 
average obey internal constraints such as orthogonality, the result of simple summa- 
tion and division by the number of summands, which would represent an arithmetic 
mean, does not obey the same constraints, in general. Therefore, in order to obtain 
an empirical arithmetic mean of orthogonal matrices, it is necessary to establish a 
calculation method by considering the geometric structure of the matrix-space that 
the matrices to average belong to. 

To solve this problem, Kaneko, Tanaka and Fiori |15) constructed an averaging 
algorithm on the compact Stiefel manifold which is a non-trivial extension of averaging 
algorithms on Lie-group- type manifolds presented in [TT] . The underlying idea behind 
the algorithms developed in the contributions [15] is that the sample-points on the 
Stiefel manifold are mapped onto a tangent space, where the average is taken, and 
then the obtained average point on the tangent space is brought back to the Stiefel 
manifold, via appropriate maps which are referred to as a retraction map and a lifting 
map. A variety of algorithms on the compact Stiefel manifold utilizing such maps 
were explained in |16) . whose convergence features and computational runtime were 
compared and which were applied to averaging real- world samples. 

The key point is that, in order to construct an averaging algorithm on the compact 
Stiefel manifold, it is necessary to construct an appropriate retraction/lifting pair. 
The averaging algorithms proposed in [16] utilize QR-decomposition-based, Cayley- 
transform-based, polar-decomposition-based and orthographic-type retraction/lifting 
pairs. All these algorithms, however, suffer of a bloating of computational demand 
with the increase of the size of the processed matrices. For example, in the case 
of the polar-decomposition-based retraction/ lifting pair, the retraction map may be 
expressed in closed form, while the computation of the associated lifting map requires 
solving a continuous-time algebraic Riccati equation. 

In the present paper, in order to ease the computational demand of the previously- 
introduced class of averaging algorithms, a new Kolmogoroff-Nagumo-type averaging 
algorithm is proposed which exploits a combination of a closed form polar retraction 



N 



N 




(1.4) 



2 



map and a closed form orthographic lifting map. The implication of the proposed 
choice are evaluated analytically as well as numerically. 

2. Averaging algorithm based on a combination of retraction/lifting 
pairs. The aim of the present section is to build-up an averaging algorithm on the 
compact Stiefel manifold based on the notion of mixed manifold retraction/lifting pair. 
The algorithm, stemming from a non-linear matrix-type equation, is implemented by 
a fast fixed-point iteration scheme. 

The compact Stiefel manifold [8] is defined by: 

St(p, nf^{X G R pxn \X T X = /„}, (2.1) 

where symbol /„ denotes a n x n identity matrix and n < p, namely, the manifold 
St(p, n) is the space of the 'tall-skinny' orthogonal matrices. Its tangent space at a 
point X € St(p, n) may be expressed as: 

T x St{p,n) = {V G R pxn \X T V + V T X = 0} . (2.2) 

Each tangent space is a vector space under standard matrix addition and multiplica- 
tion by a real scalar. 

The following measure of discrepancy 5: St(p, n)x St(p, n) — > between two 
Sticfcl-manifold matrices is made use of: 

6(X,Y) = \\I P ~X T Y\\ F , X,Y€St(p,n), (2.3) 

where symbol || • ||f denotes the Frobenius norm. 

A retraction at a point X G St(p, n) of a Stiefel manifold is defined as a map 
Px ■ TxSt(p,n) —> St(p, n) (for a complete definition see, e.g., [T]), while a map 
P^ 1 : St(p,n) -> T x St(p,n) such that PxC^x^Q)) = <2> for <3 e St(p,n), is termed 
lifting map (for a complete definition see, e.g., [16J. Both maps are defined only 
locally, therefore, hereafter it is assumed that X and Q lay sufficiently close to each 
other to evaluate P^ l {Q) and that V is sufficiently close to to evaluate P X (V). 

Given a point X G St(p, n) and a vector V G TxSt(p, n), the polar-decomposition 
retraction on the Stiefel manifold may be written in closed form pQ: 

P X (V) = (X + V)(I n + V T V)-?. (2.4) 

The associated polar-decomposition lifting is denoted by P^ 1 ■ 
The orthographic lifting map [2] may be defined as follows: 

P x 1 (Q) = 7T TxStM (Q-X), (2.5) 

where X, Q G St(p, n) and 7Tr s; st(p,n) : K px " — > TxSt(p,n) denotes a projection from 
the ambient space W xn into a tangent space TxSt(p, n). According to pQ, one such 
a projector is: 

K Tx st{ P ,n){A) = (I P - XX T )A - Xsk(X T A), (2.6) 

where A G W xn and sk(A) d =±(A T - A). Plugging equation $11^ into equation (|23]) 
gives: 

P x x (g) = (Jp - AA T )(Q - A) - Ask(A T (Q - A)) 

= (I p - XX T )Q + ^A(A T Q - Q T X). (2.7) 

The associated orthographic retraction map is denoted by Px- 
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2.1. Averaging method based on a mixed retraction/lifting pair. The 

following steps lead to an equation characterizing the unknown empirical mean matrix 
X G St(p, n), which represents an estimate of the actual center of mass C € St(p, n) 
on the basis of the available information: 

1. Map the points Xk € St(p, n) belonging to a neighborhood of the sought-for 
mean-matrix X € St(p, n) onto TxSt(p, n) by applying the lifting map P% . 

Denote such points as T4= f P^ 1 (Xfe). 

2. Compute the linear combination V — N^ 1 X\=i Vk- 

3. Bring back the mean vector V to St(p, n) by the retraction Px and get an 
empirical mean matrix X = Px (V) . 

Summarizing the above procedure, a mean matrix X 6 St(p, n) is the solution of the 
non-linear, matrix-type equation: 

X = P x(^f2 P x 1 ( X ^ (2-8) 

in the variable X. The analogy between the quasi-arithmetic mean for real numbers 
(|1.3p and the equation (|2.8|) is apparent, except that the equation (|2.8|) is an implicit 
function of the mean (as both sides of the equation depend on the mean value) instead 
of being an explicit function as in the case of Kolmogoroff-Nagumo mean. 

The equation (|2.8[) is solved numerically by means of a fixed-point iteration al- 
gorithm, that generates a sequence X"' € St(p, n) of estimates of the sought-for 
empirical mean matrix X, and that may be written as: 



i > 0, (2.9) 



where matrix X^ £ St(p, n) denotes an initial guess. In the previous contributions 
[15l[T6], the algorithm (12.91) was based on associated retraction/lifting pairs, namely, 
either Px/Px 1 or Px/Px 1 , while, in the present paper, the mixed retraction/lifting 
pair Px/Px 1 1S made use of with the aim to decrease the computational burden of 
the algorithm. 

Although, by definition of retraction/lifting pair, it holds Px ° P^ 1 = Px ° Pjr 1 = 
Ids t ( Pi „), it is recognized that the composition Px ° P^ 1 does not equal the identity 
map in St(p, n). In other terms, given matrices X, Q £ St(p, n), the discrepancy 

^x(Q) t =<^(-Px(-Px 1 (Q)), Q) differs from zero, in general. Such a discrepancy may be 
evaluated as follows. First, note that: 

Ax(Q) = ||/„ - Q T P x {P x l {Q))h, (2.10) 

define M d =Q T X and set V = (I p — XX T )Q + \X(X T Q - Q T X). It holds that: 

V T V = /„ + i(A-f T M - M 2T — M 2 — 3MM T ), (2.11) 
Q T (X + V)= I n + M - -M(M + M T ), (2.12) 



hence, the discrepancy (|2.10p takes on the expression: 



A = 



In- 
2I„- 



In 



M - -M(M + M T ) 



— (M — M T ) 2 



MM T 



(2.13) 



Note that when Q — X then M = I n and the above expression gives A = 0. 



2.2. Relationships with other contributions. The iteration rule (J2T9J) gen- 
eralizes the averaging methods proposed independently in [THl H2] ■ The fixed-point 
solution X of the iteration (|2.9|) satisfies the equation: 



N 



(2.14) 
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namely, the fixed point X is a zero of the vector field V(X) = X^fcLi -^x 1 ^*)- The 
Proposition 2.3 proved in [12] holds and ensures that, if the samples Xk are not too 
spread (see [T2] for a precise definition), the zero of the vector field V(X) is unique, 
hence, the iteration rule (|2.9[) is locally well-defined and leads to a unique average. 

The fixed-point-type iteration rule fl2.9[) also generalizes the averaging methods 
based on the minimization of a spread function based on the Riemannian distance. A 
complete study of the convergence properties of such methods, based on a gradient- 
descent- type optimization procedure, is available in [3]. An earlier study of an aver- 
aging method based on a Newton- type optimization procedure was proposed in [14] . 

The iteration rule (|2.9[) could be further generalized by introducing an unequal 
weighting scheme of the tangent- vectors P~l i} {Xk) as suggested in the contribution 
[18] (see Algorithm 2). Such weighting scheme would lead to an iteration rule of the 
type: 

X {1+1) = P xW (± £ wjpPxlo (X*)) > i > 0> (2-15) 

where weights wft > depend on the mutual distance from the samples and the 
current value of the iteration X^K 

3. Numerical experiments. In the present section, the results of different ex- 
periments are illustrated to get an insight into the numerical behavior of the discussed 
retraction/lifting map pairs in the context of averaging over the compact Stiefel man- 
ifold. In the numerical experiments, the center C € St(p, n) of the distribution of 
the samples Xk is generated by computing the Q-factor of a thin-QR decomposition 
of a matrix randomly generated in M. pxn with normally-distributed entries. The N 

samples to average are generated by the rule Xk — exp(<jfifc)C ', with flk = sk(Ak), 
where Ak is a matrix randomly generated in W xp with normally-distributed entries, 
and a > controls the spread of the distribution [TSl [16] . The initial guess m 
the fixed-point iteration algorithm (|2.9p was chosen by slightly rotating the sample 
Xi via a quasi-unit random rotation. The numerical tests were performed by running 
a MATLAB® 7 (64 bit) code on platform featuring an Intel® Xeon® (2.93 GHz) 
with 8 cores and 12GB RAM. 
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Fig. 3.1. Upper panel: Statistical distribution of the discrepancies Ax(Q) obtained by setting 
X = C (center of the distribution) and Q = (random samples). Lower panel: Statistical 
distribution of the discrepancies 5(X, Q), again with X = C and Q = Xk (in logarithmic scales). 



The first experiment aims at evaluating the numerical behavior of the compound 
map P x o Px 1 - Fi gure 13.11 illustrates the statistical distribution of the discrepancy 
AcP^fc) compared to the distribution of the discrepancies S(C, Xk)- N = 20, 000 sam- 
ples Xk G St(20,4) were generated by a spread factor of a = 0.05. The discrepancy 
values Ac(Xk) distribute approximately around 10~ 6 , confirming that the composi- 
tion Px o Px i although not being an identity map, numerically behaves similarly to 
an identity map in the considered range. Figure 13.21 shows the scatter plot of the val- 
ues of the discrepancies Ac(Xk) versus the values of the discrepancies S(C, Xk). The 
relationship between them has a positive-correlation trend, showing that the more the 
arguments Q and X deviate from each other, the more the compound map Px ° Px 
deviates from the identity map. 

The second experiment concerns the convergence properties of the fixed-point iter- 
ation algorithm (12.9[) based on the polar retraction and orthographic lifting (namely, 
the mixed retraction/ lifting pair Px / P^ 1 ), the orthographic retraction/lifting pair 
(Px/Px 1 ) an d the polar retraction/lifting pair (Px /Px 1 ). For this experiment, a 
number N = 30 of samples were generated on the manifold St(20, 4) with a spread 
parameter a — 0.2. Figure |3~31 shows the values of the index S(X^ l \ C). The three al- 
gorithms behave satisfactorily and converge to solution-matrices that locate at similar 
distances to the actual center of the distribution. 

The third experiment aims at illustrating a comparison about the computational 
complexity of the three algorithms. A close inspection of the fixed-point algorithm 
(I2.9p reveals that, for a general compact Stiefel manifold St(p, n), the computational 
complexity is essentially a function of the number n. The Figure 13.41 shows the 
runtimes corresponding to the tested algorithms run on the manifold St(100, n) with 
varying n. Such numerical simulation was performed with N — 50 samples generated 
with a spread-parameter value a = 0.01. Each averaging experiment for each value of 
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Fig. 3.2. Scatter plot of the values of the discrepancies Ax{Q) versus the values of the dis- 
crepancies 5(X, Q) obtained by setting X = C and Q = (in logarithmic scales). 
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Fig. 3.3. Experiment about averaging on the manifold St(20,4)- Index 5(X^> ,C) during itera- 
tion. 

the index n was repeated 100 times to get rid of random fluctuations in the evaluation 
of runtimes. The obtained results indicate that the averaging algorithm based on a 
combination of the polar retraction map and the orthographic lifting map is much 
lighter than averaging methods based on associated retraction/lifting pairs in terms 
of computational burden. Likewise, Figure 13.51 shows the runtimes corresponding to 
the tested algorithms run on the manifold St(p, 10) with varying p. Such numerical 
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Fig. 3.4. Runtimes for the manifold St(l00,n) with the integer parameter n varying. 



10' 



Manifold: St(p,10) 



Polar 

-Orthographic 
■Polar+Orthographic 



10° r 



1 10- 1 : 



10" 




40 60 

Rows (p) 



Fig. 3.5. Runtimes for the manifold St(p, 10) with the integer parameter p varying. 



simulation was performed with N = 50 samples generated with a spread-parameter 
value a = 0.01. The experiment for each p was repeated on 100 independent trials to 
get rid of random fluctuations. The obtained results indicate the little dependence of 
the computational complexity of the algorithm from the dimension p. 
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4. Conclusion. The present research work proposed a new averaging algorithm 
on the compact Stiefel manifold based on a combination of non-associated (mixed) 
retraction and lifting maps. The combined retraction/lifting pair was studied both 
analytically and numerically and the new averaging algorithm was tested numerically. 
The obtained numerical results confirm that the composition Px ° Px , although not 
being an identity map, numerically behaves like an identity map in the considered 
range and that the new algorithm behaves satisfactorily and converges to solution- 
matrices that deviate from the actual center of the distribution of a similar amount. 
A comparison between three fixed-point algorithms based on the non-associated polar 
retraction and orthographic lifting, on the associated orthographic retraction/lifting 
pair and on the associated polar retraction/lifting pair reveals that the averaging 
algorithm based on a combination of the polar retraction map and the orthographic 
lifting map is remarkably superior to the averaging algorithms based on associated 
retraction/lifting pairs in terms of computational demand, resulting the lightest one. 
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